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ABSTRACT 

We develop two general methods to infer the gravitational potential of a system using 
steady-state tracers, i.e., tracers with a time-independent phase-space distribution. 
Combined with the phase-space continuity equation, the time independence implies a 
universal Orbital Probability Density Function (oPDF) dP(A|orbit) oc df, where A is 
the coordinate of the particle along the orbit. The oPDF is equivalent to Jeans theo¬ 
rem, and is the key physical ingredient behind most dynamical modelling of steady- 
state tracers. In the case of a spherical potential, we develop a likelihood estimator 
that fits analytical potentials to the system, and a non-parametric method (“phase- 
mark”) that reconstructs the potential profile, both assuming only the oPDF. The 
methods involve no extra assumptions about the tracer distribution function and can 
be applied to tracers with any arbitrary distribution of orbits, with possible extension 
to non-spherical potentials. The methods are tested on Monte Carlo samples of steady- 
state tracers in dark matter haloes to show that they are unbiased as well as efficient. 
A fully documented C /Python code implementing our method is freely available at 
a GitHub repository linked from http://icc.dur.ac.Uk/data/#oPDF. 

Key words: methods: data analysis - Galaxy: fundamental parameters - galaxies: 
haloes - galaxies: kinematics and dynamics - dark matter 


1 INTRODUCTION 

Since dark matter does not emit or absorb electromagnetic 
radiation, gravitational modelling is of fundamental impor¬ 
tance to the determination of the dark matter distribution. 
Such modelli ng can be performed using either g ravitational 
lensing fe.g.. lBartelmannll2010l : iHan et al.ll2014D, or the dy¬ 
nami cs of tracers (e.g., stars or galaxies; see ICourteau et all 
120141 for a recent review on galaxy mass inferences). 

One straightforward way to perform dynamical mod¬ 
elling is to fit a proposed phase-space distribution function 
(DF) to the observed positions and velo cities of tracer parti¬ 
cles. Thanks to Jeans theorem (see e.g. lBinnev fe Tremainel 
120081) . which states that functions of the integrals of motion, 
J, are solutions of the Boltzmann equation, one can simply 
consider DFs of the form /(J). Under certain conditions, one 
can invert the observed density profile, p = J f(J )d 3 v, of the 
trace r to construct a specific family of f(J) (e.g .,|Ed dingtonl 
Il91& lOsipkovl Il979l : iMerrittl Il985l : ICuddeford) Il99ll : here- 
after referred to as density profile inversion). The density 
profile inversion depends on the potential of the halo, and 
the resulting / is thus potential dependent. However, some 
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further assumptions about the functional form of /(J) are 
required to perform the inversion. These as sumptions are 
typically motivated either emp irically (e.g., IWoitak et al.l 
120081 : IWilliams fe Evansl 2015 a), or by mathematical sim¬ 
plicity ~(e!g!7[Ewin^^^U] I 2 QQT When proposing the DFs, 
one is free to choose the integrals of motion, to be either clas¬ 
sical integrals such as energy and angular mome ntum, or the 
more theoretically appealing a ctions (see e.g., IPosti et al.l 
120151 : IWilliams fe Eva.nl] l2015bl . for such recent models on 
halo stars). 


In IWilkinson fe Evansl (1 1999T) and IWang et al.l ll2015l) , 
solutions of the form f(E,L) = f(E)L ~ 2,9 were used to 
constrain the potential of the Galactic halo, where E and 
L are the energy and angular mome ntum of tracer parti¬ 
cles. In particular, IWang et al.l J2015I) applied the f (E,L) 
method to mock stellar haloes ( Cooper et al.l 120101 ) con¬ 
structed fr om the Aquarius sim ulations of ACDM galac¬ 
tic haloes dSpringel et alj 120081 ) and found significant bi¬ 
ases in the fitted masses. These biases suggest that the 
proposed f(E, L) DF does not describe well the observed 
phase-space distribution of the mock stars. However, it is 
not clear whether the discrepancy is due to departures from 
dynamical equilibrium by the stars within the halo poten¬ 
tial, or to the lack of generality in the proposed f(E,L) 
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functional form used to describe the distribution (e.g., 
iBinnev fe Mamonlll982l ). In the former case, the stars would 
represent an intrinsically biased tracer, and the modelling of 
their distribution would not give the correct potential. In the 
latter case, one can still hope to find a DF that fits the ob¬ 
served distribution with the correct potential once the extra 
assumptions in the functional form of f(E, L) are relaxed or 
removed. 

A method requiring no assumptions on the form of 
f{E, L) is achievable, which is what we develop in this work. 
The starting point of our method is the definition of tracer. 
We define a tracer as a population of objects whose phase- 
space DF does not evolve with time (i.e., is in a steady state), 
so that modelling their DF at an arbitrary time is generally 
possible and useful. It immediately follows from this defini¬ 
tion that the probability of observing a particle at a position 
on its orbit is proportional to the time it spends near that 
position, i.e., dP|orbit oc df. Formally, this can be shown to 
be a result of phase-space continuity fSection l2.2l) . We give 
a thorough description on this Orbital Probability Density 
Function (oPDF) in Section [5] with further discussion in 
Section ED 

This simple relation actually contains all the infor¬ 
mation required to model the potential of the system. 
We demonstrate this by constructing explicit estimators 
for the potential from the oPDF in Section [4] Expressed 
in action-angle coordinates, where the angles evolve uni¬ 
formly over time, the steady-state distribution is a uni¬ 
form dis tribution in angle, also k nown as the orbital 
roulette jBeloborodov fe Levin|[2004l . hereafter BL04). Two 
minimum-distance estimators have been proposed in BL04 
to infer the potential using the uniform angle distribution. 
Unfortunately, when applied to a ACDM halo potential, we 
find that these phase angle estimators only probe the grav¬ 
ity at (or equivalently, the halo mass inside) a tracer-specific 
characteristic radius of the Navarro-Frenk-White (NFW; 
iNavarro et al.j[l996l . ll99'ii ) type potential, resulting in a high 
degeneracy in the mass (M) and concentration (c) parame¬ 
ters of the halo potential. 

Expressed in the radial coordinate, the steady-state dis¬ 
tribution translates into an orbit-dependent DF in r. For any 
given potential, one can then predict a radial distribution for 
the tracer according to the occurrence of orbits in the data. 
A likelihood estimator can be constructed by comparing the 
observed radial distribution to the predicted distribution. 
This radial likelihood estimator is largely able to break the 
degeneracy in M — c and provides a good constraint on the 
halo potential profile over a much larger radial range. This 
parametric likelihood estimator is described in Section m 

Alternatively, the degeneracy in the phase angle esti¬ 
mators can be utilized to break the degeneracy itself. In 
particular, the degeneracy in the mean-phase estimator is 
so strong that it provides no constraint on the halo mass 
profile anywhere except at the characteristic radius, leaving 
the shape of the halo mass profile unconstrained. Such a de¬ 
generacy can be broken by applying the estimator multiple 
times to subsamples of the tracer in different radial ranges, 
thus constraining simultaneously the halo mass at different 
characteristic radii. At the same time, the perfect degen¬ 
eracy means that profiles of any shape can be adopted to 
fit for the characteristic mass. Fitting two profiles of differ¬ 
ent shapes with the mean-phase estimator, we can find out 


the characteristic mass point by locating the point where the 
two mass profiles intersect. This leads to our non-parametric 
potential profile reconstruction method, in which we fit two 
elementary one-parameter profiles with the mean-phase es¬ 
timator to “mark out” the characteristic mass point in each 
radial bin. This “phase-mark” method is detailed in section^ 

The oPDF describes the conditional distribution of a 
particle in phase space given its orbit. Coupled with as¬ 
sumptions on the prior distribution of orbits, one can re¬ 
cover a full phase-space DF applying Bayes’ theorem. In 
Section ETT] we will show that these distribution functions 
are fully compatible with those constructed from a density 
profile inversion. Such a DF can then be used to fit the ob¬ 
served distribution of the tracer to infer the potential, which 
is the approach taken by conventional DF methods. If the 
assumptions on the distribution of orbits are correct, then 
the DF method is fully compatible with ours. However, our 
method still works even if these assumptions fail, while the 
validity of the DF methods is intrinsically limited by the 
validity of these assumptions. 

There are some DF methods based on more general 
assumption s abou t the distribution of orbits. For example, 
iBovv et al.l (l2010l l generalized the Roulette distribution to 
a Bayesian likelihood estimator by combining the uniform 
distribution of action-angles with the distribution of orbital 
parameters (e.g., ( E,L )). The latter is modelled parametri¬ 
cally or with histograms, and the parameters of the distri¬ 
bution of orbits are further marginalized over some assumed 
priors. They applied their method to infer the potential of 
the S olar system using the planets as tracers. iMagorrianl 
(I2014P also proposed a Bayesian method by modelling the 
distribution of orbits non-parametrically with an arbitrary 
number of Gaussians in action space, and then marginaliz¬ 
ing over the proposed prior distribution of the normaliza¬ 
tion, location and width of the Gaussians. These methods 
are still not assumption-free, because a particular form for 
the distribution of orbits and priors for their parameters 
still need to be assumed. Adopting more general functions 
to describe the distribution of orbits also tends to complicate 
the mathematical and computational aspects of the problem 
tremendously. Compared with these Bayesian marginaliza¬ 
tion methods, our method is much simpler and more intu¬ 
itive. We do not need to model the distribution of orbits at 
all, so our method is truly assumption-free in so far as the 
distribution of orbits is concerned. 

Our likelihood esti mator is closely r elated to 
Schwarzschild’s method dSchwarzschildl Il979l l. a gen¬ 
eral numerical method that solves the p = f f(J)d 3 v 
equation numerically to obtain / as well as the potential. 
Without loss of generality, our method effectively works 
by determining one orbit from the phase-space coordinate 
of each particle, avoiding the numerical search for the 
combinations of orbits used in Schwarzschild’s method. We 
elaborate on this point in Section T6. 3. 21 

In a follow up paper (lHan et al ] 20iE Paper II), we 
apply our oPDF analysis to tracers of Gala xy-sized haloes 
const ructed from the Aquarius simulations (ISpringel et al.l 
l2008h . to study the dynamical status of both the dark matter 
and stars in the halo, and to gain insights on the intrinsic 
uncertainties in the inferred dynamical mass of the Milky 
Way. 
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2 STEADY STATE TRACERS 

As the fundamental concept used in this work, we start by 
deriving the steady-state distribution of test particles, which 
is used as the definition of tracers throughout. 


2.1 The orbital Probability Density Function 

We consider steady-state systems consisting of a set of test 
particles (a tracer) moving under a gravitational potential. 
We require both the total potential and the phase-space dis¬ 
tribution of the tracer particles to be in a steady state, i.e., 
to not evolve with time. As long as the total potential is 
static, we do not care whether it is generated by the tracer 
alone or purely by an external field, with the tracer being 
massless, or from both components. The static potential as¬ 
sumption is reasonable as long as the crossing time for tracer 
particles is much smaller than the timescale for the variation 
in potential. 

Under the static potential condition, each particle has 
a fixed and predictable orbit. If the tracer particles are to be 
in a steady state, then for any given orbit, the probability of 
observing one particle at a given position (labelled by some 
parameter A) has to be proportional to the time it spends 
at that position, i.e., 

dP(A|orbit)/dA oc dt(A|orbit)/dA. (1) 

In other words, if each particle has a fixed orbit, then the 
travel time on different parts of the orbit determines the 
density of particles observed along the orbit. We call equa¬ 
tion o the orbital Probability Density Function (oPDF). 
This can be understood as arising from the ergodicity of 
each particle in the steady-state system. The static poten¬ 
tial leads to fixed orbits. If the overall system is in a steady 
state, ergodicity translates each orbit into a PDF. 


2.2 oPDF from phase-space continuity 

Formally, we can derive the oPDF from the phase-space con¬ 
tinuity equation (i.e, the collisionless Boltzmann equation) 
of the steady-state system as shown below. Readers not in¬ 
terested in the proof can skip this subsection. 

Let us consider the two requirements of our steady state 
system: 

Static Potential. Since the potential is fixed, each particle 
has a fixed and predictable orbit. Formally, one would be 
able to specify the phase-space coordinates with a set of 
orbital parameters Qi(i = l...n) determining the shape of 
the orbit, plus one affine parameter, A, specifying the cur¬ 
rent position of the particle along the orbit. Note that A 
can be any parameter that uniquely specifies the position of 
the particle on the given orbit, e.g., radius r, velocity v or 
the elapsed time since apocentre. Then the phase space of 
tracer particles is fully sampled by the distribution of orbits 
and the current position of the particles on each orbit. The 
distribution of orbits is fixed in a collisionless system, and 
any evolution of phase-space density is only caused by the 
change of the on-orbit position of each particle. In this co¬ 
ordinate system, the phase-space continuity equation reads 


df y- d(fQj) djf A) 

dt ^ dQi d\ 

z 


( 2 ) 


Since Qi = 0, we have 

df d(f\) 
dt d A 


(3) 


Steady-state. To be able to predict the distribution of par¬ 
ticles we require a tracer population to be in a steady state, 
that is, df /dt = 0 at any point in phase space. Immediately, 
from equation 0 this implies d(fX)/dX = 0 and hence 


^|Q«/(Q,A)oc i, (4) 

where Q denotes the set of orbital parameters. That is, the 
probability of observing a particle at a given position is pro¬ 
portional to the time it spends near that position: 

dP\Q oc -i— — dt. (5) 

A 

Q.E.D. 

Note this is phase-space continuity and is more general 
than configuration-space continuity for steady flows. The 
oPDF is a fundamental equation governing the distribution 
of steady-state tracers. It is a very general result that follows 
from very general assumptions. We will also call this phase- 
space steady-state distribution the equilibrium distribution. 
Note that the definition of a tracer puts no constraint on 
the distribution of orbits. A tracer with any arbitrary dis¬ 
tribution of orbital parameters can be constructed, as along 
as the oPDF is satisfied. 


2.3 oPDF in spherical systems 


In this work we focus the application of the oPDF to a spher¬ 
ically symmetric potential, whose value depends only on ra¬ 
dius ip(r, 9, </>) = In this conservative central force field, 
the binding energy, 

E = ~(y + y + i>{r)), (6) 

and angular momentum, 

L = rx v (7) 

= rv t e L , (8) 


of each particle are conserved, and form a complete set of 
orbital parameters. Taking r as the affine parameter A along 
the orbit, equation © becomes 


dP(r\ E , L) = fL 


1 dr 

tR’ 


(9) 


where T = f dt is the period of the orbit. Note that we 
only need L rather than L if we are only interested in the 
radial motion of particles. Since the orbit is symmetric for 
the inward and outward-going parts, we ignore the direction 
of the radial velocity and take one single journey between 
pericentre r p and apocentre r a as one period. When radial 
cuts are imposed, we only need to replace the orbital lim¬ 
its r p with rnax(r p ,r m i n ) and r a with min(r a ,r max ), since 
equation [T| holds within any radial range. 
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More generally, in equation © the choice of the po¬ 
sition variable, A, is not limited to the radial coordinate. 
It can be any variable that uniquely determines the phase- 
space coordinate on its orbit. As an example, we can choose 
A to be the travel time a particle has spent to get to the 
current position. In this coordinate, the distribution of par¬ 
ticles is uniform along the orbit. If we define an angle at 
each position r as 


6(r) 


fr p dr/\v r \ 
T 


( 10 ) 


where r p is the pericentre distance, then the orbital PDF 
becomes 


dP{6\E,L) = dO. (11) 


which is purely a function of ( E,L ). Q.E.D. 

Put simply, the Jeans theorem implies a known radial 
distribution, so that the full phase-space DF only needs to 
be specified in (E, L) coordinates. 

Starting from the oPDF, one can construct the full DF 
from Bayes’ theorem, by specifying the distribution of orbits, 
P(E,L), of the tracer, as 

dP(r,v) = dP(r\E,L)dP(E,L). (19) 

Depending on how P(E, L) is specified, the constructed DF 
varies. A popular way of specifying P(E, L) relies on the 
radial profile constraint 

! J^Tv d3v = p{r) - (20) 


This PDF is a uniform distribution, with 9 £ [0,1]. This an¬ 
gle is known as an action-angle, and its randomness has been 
argued for or assu med in previous w orks (“random phase 
principle” in BL04; [Bow et all |201C)I ') . Here we do not as¬ 
sume randomness of the angle; rather the randomness is a 
derived property from the continuity equation of the steady- 
state system coupled to the uniform time evolution of the 
action-angle. We will call this angle the radial phase. 

Despite the focus on spherically symmetric potentials in 
this work, the oPDF does not require spherical symmetry 
for the tracer distribution. For example, the oPDF holds for 
a tracer on a single elliptical orbit with the same ( E,L). 
The method can be generalized to (E, L) orbits where the 
asphericity of the orbits is explicitly used. 


2.4 Equivalence to Jeans theorem and connection 
to other DFs 


A fundamental constraint on the DF of steady-state systems 
is provided by the Jeans theorem. It is useful to clarify how 
it connects to the oPDF. 

Below we demonstrate the connection in a spherically 
symmetric system. In such a system, ignoring the angu¬ 
lar distribution, each particle has three phase-space coor¬ 
dinates, which can be specified by (r,v T ,Vt) or equivalently 
( E,L,r ). The phase-space distribution function can gener¬ 
ally be written as 


dP(r, v) 


f(E,L,r)d 3 rd 3 v 


f(E,L,r)8n 2 L 


drdEdL 

M 


( 12 ) 

(13) 


Now we prove the equivalence of oPDF with Jeans theorem. 
If Jeans theorem holds, i.e, f(E, L, r) — f(E, L), then 


dP(r\E, L) 


d P(E,L,r) 
f r dP(E, L, r) 


dr 

\vr(E,L,r)\' 


Conversely, if the oPDF holds, then 


(14) 

(15) 


dP(E, L, r) = dP(E, L) dP(r\E, L) 


d 2 P(E,L) dr 
dEdL | tv | T 


dEdL. 


Combining with Eq. m, we have 


f(E,L,r) 


1 d 2 P{E,L) 

8t r 2 LT(E,L) dEdL 


(16) 

(17) 


(18) 


This is obtained mathematically through, for example, an 
Abel transform, with p(r) being the parametrized density 
profile of the tracer. Even though Eq. m is not explic¬ 
itly used, its equivalent, Jeans theorem is used to pro¬ 
pose a DF of the form f(E,L). However, at this stage, the 
mathematical inversion of Eq. (1201) is not generally solv¬ 
able without further restrictions, and one typically needs to 
further assume some more sp ecific forms of f(E,L), for ex¬ 
ample f(E, L) =L~ 2f3 F(E) (I CarrinJ 1 1951: 1 Cud de fordl 1 1991 1 : 
IWilkinson fe EvansHl999l : IWang et alj 2015l l. In some other 
works, the radial constraint is not used and a more gen¬ 
eral distribution of orbits is proposed (IBovv et al.l 120101 : 
lMagorrianll2014l ~). 

Note the distribution function constructed following 
Eq. m is non-negative as long as d P(E, L ) is non-negative, 
because dP(r\E, L) oc dt > 0 always holds. As we will see 
later, in practice we approximate the d P(E, L) with the 
empirical distribution given by Eq. 1241), which is always 
non-negative and corresponds to the discrete realization of 
a physical DF. 


3 DATA: IDEAL TRACERS 

In order to test the performance of potential estimators, 
we first generate a set of Monte-Carlo steady-state tracers. 
Tracer particles are generated according to th e probab i lity 
distrib ution dP(r,v) = f(E,L)d 3 rd 3 v used in lWang et al.l 
(l2015l l. This DF is constructed by inverting a double-power 
law tracer density profile, assuming f(E,L) = f(E)L ~ 2/3 
and a Navarro-Frenk-White (NFW iNavarro et al.l 1 1 99(1 
Il997h potential. It describes a spherically symmetric steady- 
state system of tracers inside an nfw halo. The detailed form 
of the DF is complic ated and we refer the reader to Eq. 12 
of lWang et all l2015l l for a full description. The parameters 
of the DF include the mass, M, and concentration, c, of the 
nfw halo, the tracer velocity anisotropy, 0, and the double 
power law slopes and pivot radius of the tracer density pro¬ 
file, a, 7 and r p . Their values are chosen to best match the 
distribution of mock stars inside a Milky Way (MW ) sized 
halo in the Aquarius simulation (ICooper et alj|20ld b with 
M = 1.83 x 10 12 M 0 ,c = 16.2,0 = 0.715, r p = 69.0 kpc, a = 
2.30 ,7 = 7.47. The mock tracers generated according to 
this f(E, L) are expected to be a self-consistent, yet sim¬ 
plified, realization of the distribution of stars in the MW. 
Tracer particles are generated between 1 and 1000 kpc in 
radius. We will call these catalogues ideal tracers. Since it 
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is a steady state system, the oPDF is applicable^ In fact, 
as we have discussed in Section 12.41 any f(E, L) DF has to 
be compatible with the oPDF but it also imposes additional 
assumptions. 

In real observations, the phase space coordinates of 
tracer particles are inevitably affected by observational er¬ 
rors. We do not consider such errors in the main portion of 
this paper. In the following we simply use the ideal tracer 
with their exact phase space coordinates, to test our meth¬ 
ods. We briefly discuss possible extensions of applying our 
methods to real data in Section [6A1] 


4 PARAMETRIC POTENTIAL ESTIMATORS 

The problem we are trying to solve is: given a tracer popu¬ 
lation in equilibrium, with observed positions and velocities, 
how do we infer the potential of the steady-state system in 
which it resides? For any assumed potential, we can convert 
the positions and velocities of particles into (r, E, L ) or 9 co¬ 
ordinates. This results in empirical distributions for the par¬ 
ticles in these coordinates. By comparing these distributions 
with the oPDF expected for tracer populations it is possible 
to constrain parameters of the underlying potential. Below 
we consider various parameter estimators that compare em¬ 
pirical and expected distributions: two minimum distance 
estimators given in BL04, and one maximum likelihood es¬ 
timator that we develop in this work. 


4.1 Minimum Distance Estimators and Parameter 
Degeneracies 

For minimum distance estimators, one constructs a metric 
to specify the distance between the empirical and theoreti¬ 
cal distributions, and minimizes this distance to infer model 
parameters. Since the phase angles are computed quantities 
assuming a model potential rather than observed quantities, 
one cannot construct likelihood estimators using the distri¬ 
bution of the angles (the differentiation of angles would in¬ 
troduce model dependence). Following BL04, we consider 
two distance measures to quantify the consistency of the 
data with a uniform phase angle distribution. For a sample 
of N particles drawn from a uniform phase distribution, the 
mean phase, 9, is expected to follow a normal distribution 
with mean 0.5 and standard deviation 1/V12 N according 
to the central limit theorem. The normalized mean phase 
deviation, 

@ = V\2N (0 — 0.5), (21) 


1 Strictly speaking, the DF of I Wang et alJ (|2015h only describes 
a system of massless tracers in an external NFW potential, be¬ 
cause the tracer density would exceed the total density as r —> 0 
unless the tracers were massless. However, as we state at the be¬ 
ginning of section [2] our definition of a steady-state system does 
not depend on the origin of the potential and equally applies to 
massless tracers in an external potential. So this is not an issue 
for our analysis. It may also be worth noting that the tracer ve- 
l ocity disper s ion ap proaches 0 at r = 0 for this DF according to 
l An fc Evans (12009 1. However, this does not prevent the system 
being in a steady-state. In addition, our radial cut of 1 to 1000 kpc 
ensures we are not affected by the behaviour at the centre. 


is then a standard normal variable. 0 2 then serves as a mea¬ 
sure of the distance of the actual phase distribution from 
the expected uniform distribution. If the data follows the 
model distribution, then the 0 2 from different realizations 
of the same distribution will follow a yf distribution with 
one degree of freedom. Hence, the discrepancy level of the 
data from the model can be quantified by the probability of 
obtaining a \ 2 as extreme as the measured value of 0 2 . 

A more sophisticated distance measure can be con¬ 
structed by comparing the cumulative data distribution, 
P<g, to the expected distribution, P < g = 9, as (BL04) 


D = 



( P <°- p <f d 9. 

var (Pee) 


( 22 ) 


where var (P<e) = 9(1 — 9)/N is the va riance of P < g. This is 
know n as the Anderson-Darling (AD lAnderson fe Darlind 
Il954h distance measure. For a set of N particles with phase 
angles, Eq. 121 can be evaluated: 


D = -N+jjJ2(l-2i)ln0i-[l+2(N-i)]ln(l-0i). (23) 

i=1 

The above form is simpler than the one derived in BL04. 
The theoretical distribution of D can be found from Monte- 
Carlo simulations. A fitting formula is given by BL04 which 
fits the tail of the distribution. In Appendix m we provide 
a binormal fit to the distribution of ln(D) that works very 
well for the full DF. 

A small value of 0 2 or D suggests a good match be¬ 
tween data and model. Hence, these two distance measures 
can be used to fit the data to parametric models, by search¬ 
ing for parameters that minimize the distances. Confidence 
intervals can also be defined by distance contours chosen so 
that the probability of obtaining a distance measurement 
as extreme as that of the contour value equals the desired 
confidence level. 

In Fig. Q] we apply the minimum distance estimators 
to an ideal tracer of 1000 particles. It is obvious in Fig. [T] 
that the mass-concentration parameters are highly degener¬ 
ate for both estimators. For the mean-phase estimator, there 
is not a unique minimum distance point but a minimum dis¬ 
tance line with 0 = 0, as marked by the central red dashed 
line. As a result, there is not a single best-fitting param¬ 
eter, but a line of degenerate solutions. The AD estimator 
shows a slightly weaker degeneracy, which can be understood 
because it uses more information than the mean-phase es¬ 
timator. However, the best-fitting parameters from the AD 
estimator still depend sensitively on the initial parameters 
due to the degeneracy. 

Even though the usefulness of these two statistics as es¬ 
timators is severely limited by the strong degeneracies, they 
can still be used as theoretical probes to identify regions of 
discrepancy in heterogeneous data. In particular, the signed 
mean phase deviation as a standard normal variable is easy 
to calculate as well as being easy to interpret, making it a 
good residual measure of the phase distribution under the 
proposed potential. We demonstrate such an application in 
Section [5]21 Similar applications can be found extensively in 
paper II when analysing tracers from cosmological simula¬ 
tions. 
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Figure 1. Constraints on the mass, M, and concentration, c, 
derived from 1000 particle realizations drawn from a model nfw 
halo. The outer (blue) and inner (brown) contours mark the 1 and 
3 a confidence regions respectively, for the mean phase (dashed) 
and AD (solid) estimators. The central red dashed line marks 0 = 
0. The parameters are expressed in units of the true parameter 
values, Mq and cq. 



Figure 2. The radial likelihood function. We apply the unbinned 
(Eq. ||30> . red spiky curve) and binned (Eq. (1311) . black smooth 
curve) likelihood estimator to the same sample of 1000 ideal tracer 
particles in an NFW halo. The concentration parameter is fixed 
to be the true value and the likelihood is calculated as a function 
of the mass parameter, M , normalized by the true mass, Mo- 
The likelihood ratio, 2 In(£/£ max ), is plotted where £ max is the 
maximum likelihood value as the mass is varied. We adopt 30 
bins logarithmically space between 1 and 1000 kpc in radius in 
the binned case. 


4.2 Breaking the degeneracy: the radial likelihood 
estimator 

In addition to the minimum distance estimators in 0-space, 
we also try to construct maximum likelihood estimators. 
Since 9 is not a direct observable but is model dependent, 
its PDF cannot be directly used to construct a likelihood. 
Instead, we work with the directly observable r-coordinate 
to calculate the likelihood of the observations, making use 
of the oPDF, P(r\E, L). Since E is unknown before the po¬ 
tential is known, trying to use the conditional probability 
C = Yli^P( r i\Ei, Li) as a likelihood to infer the potential 
will fail. Since P(E, L) is solely a characteristic property of 
the tracer (tracers with any arbitrary P(E, L) can be con¬ 
structed or selected) and is independent of the potential, one 
could seek to eliminate the ( E , L) dependence by marginal¬ 
izing over their prior distributions. 

A proper marginalization can be done if one knows the 
P(E, L) distribution. If one introduces additional assump¬ 
tions on P(E,L) (e.g.. IBovv et al.1l2Q10i ). then the method 
essentially reduces to a ,f(E, L ) method, whose generality is 
limited by the assumptions made. We would like to avoid 
any such assumptions in order to avoid any potential bias 
in the inferred potential introduced through them. Without 
prior knowledge of P(E, L), we can approximate it with the 
observed distribution 


d 2 P(E,L) 


+ + ( 24 ) 

i 


Now the marginalized distribution becomes: 

dP(r) = J AP(r\E,L)^^^dEdL 

1 N 

= -E dP(r \Ei’ Li )- 

i=1 


(25) 

(26) 


This is the mixed (empirically marginalized) radial distribu¬ 
tion, an analogy to the marginalized theoretical radial dis¬ 
tribution. We can define a reciprocal probability of finding 
a particle at r, with the orbital parameters corresponding 
to ( Ej , Lj) as 


Pij — 


AP{n\Ej,Lj) 

dri 


1 

v r (ri,Ej, Lj)Tj 


(27) 

(28) 


Then, the marginalized radial distribution becomes 


dPjn) _j_ v - ' p 

dri N ^ 13 

3 = 1 


(29) 


This P(ri) is actually the posterior probability of finding a 
particle at position ri given the orbital parameters of all the 
tracer particles, P(n\Ei, E 2 , ...En, Li, L 2 , ...Ln). Now the 
likelihood can be written as 


N N 

C = UY. />,j ■ (3°) 

i=l 3=1 

However, the above likelihood function is very noisy. I 11 
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Fig. [2] we show an example of this likelihood as a func¬ 
tion of the halo mass parameter for the ideal tracer. Over¬ 
all, the global shape of the likelihood function peaks around 
the true parameter value. On the other hand, the likelihood 
function is not at all smooth. Moreover, one does not get 
rid of this bumpiness by zooming into a finer grid in M. 
The noisy behaviour originates from the singularities in the 
PDF at peri and apo-centres, and from the discreteness in 
the (E, L) distribution, which we described as a sum of 8- 
functions. The noise in the likelihood can thus be regarded 
as Poisson noise. This Poisson noise prevents accurate in¬ 
ference of parameter values. Some form of smoothing can 
suppress the Poisson noise. For example, one could try to 
do kernel interpolation in the (E, L) distribution, to make 
it a continuous rather than a disc rete distribution (see e.g., 
iBovv et aHl20ld : lMagorrianll2014l f. The simplest smoothing 
strategy would be to bin the data. If we bin the data radi¬ 
ally into m bins, then the binned version of the mixed radial 
likelihood can be written as 

m 

C = Y\™i i exp(— hi) 
i=1 

= exp ^ hi 

m 

= exp(-A) n" 4 , (31) 

i=i 

where rii is the number of particles in the i-th bin. We have 
omitted the data-dependent constants in the above equa¬ 
tion, and hi = N due to the normalization of the PDF. 
The predicted number of particles in the i-th bin is given by 

hi = N [ U '' dri ( 32 ) 

J r hi d n 

where r\ d and r u ,i are the lower and upper bin edges, and 
P(r;) is given by equation (1291) . The binned likelihood curve 
is also shown in Fig.[5]for the same sample. Clearly, the Pois¬ 
son noise has been suppressed and the likelihood function is 
now smooth and usable for parameter inference. 

In Fig.[3]we apply the binned radial likelihood estimator 
to the ideal tracers. In the left panel, the three estimators 
are applied to the single realization of 1000 particles used 
in Fig. [T] The degeneracy we have seen in the phase angle 
estimators is broken in the radial likelihood estimator. The 
estimators are applied to many (750) independent realiza¬ 
tions of the same f(E, L) distribution, and the distribution 
of the best-fit parameters are plotted in the right panel of 
Fig-El This test shows the estimator to be unbiased when av¬ 
eraged over many realizations. For comparison, we also show 
the result of a likel i hood analysis using the full f(E, L) dis¬ 
tribution as in I Wang et al.l (l2015l l. Note that since the data 
are generated with this exact DF, the f(E, L) likelihood ap¬ 
plied to these ideal tracers represent the best constraint one 
can get from any likelihood inference. 

The radial likelihood estimator gives wider but still 
comparable confidence intervals as the full f(E, L) estima¬ 
tor. Note that we have not assumed anything about the 
( E , L ) distribution of the tracers and that the mock cata¬ 
logue is used blindly. The above test is a general demonstra¬ 
tion that our method will work for any steady-state trac¬ 
ers in a static spherical potential. Compared to the perfect 


f(E, L ) method, where the adopted f(E, L) exactly matches 
the form of the unknown underlying distribution function, 
the confidence interval is wider but, in practice, the f(E, L) 
method will be biased if the tracer follows a ( E , L ) distribu¬ 
tion other than the specific f(E, L) model adopted. With a 
small increase in noise, our radial likelihood method gains 
generality. Compared with the minimum distance estima¬ 
tors, the radial likelihood estimator breaks the degeneracy 
in the mass concentration parameters. Since the likelihood 
can be interpreted as the conditional probability of the data 
given the model, in principle it can also be adopted in an 
Bayesian analysis. 

We now make a few comments on the practical applica¬ 
tion of the binned likelihood estimator. Since the purpose of 
the binning is purely to suppress shot noise, a larger number 
of bins is generally better, as long as it is not too noisy. On 
the other hand, when the likelihood contours appear too ir¬ 
regular, one should try reducing the number of radial bins to 
ensure the irregularities are not caused by shot noise. In our 
analysis, we have adopted 30 logarithmic bins for an ideal 
sample of 1000 particles, and 50 bins for 10 6 particles in a 
realistic halo (paper II), although as few as 5 bins could still 
work. Due to the singularity of the 1/v integrand at orbital 
boundaries, it is expensive to achieve high accuracy for the 
phase calculations. As a result, it is difficult to use algo¬ 
rithms involving numerical derivatives for the optimization 
of the likelihood values. Instead, we adopt the Nelder-Mead 
simplex minimizer to search for the maximum of the likeli¬ 
hood. 


5 RECONSTRUCTING THE POTENTIAL 
PROFILE: THE PHASE-MARK 
NON-PARAMETRIC METHOD 

5.1 Towards understanding the parameter 
degeneracy 

The strong parameter degeneracy with the minimum dis¬ 
tance estimators is easy to understand when we examine 
the constraints on the mass profile of the halo, as shown 
in Fig. [I] Parameters yielding the same mean phase devi¬ 
ation, 0, all predict the same mass M(< R c ) = M c inside 
a characteristic radius, R c , of the tracer, which is close to 
the half-mass radius of the tracer. Different 0 values corre¬ 
spond to different M(< R c ), with a positive correlation be¬ 
tween the two. In other words, the mean phase of the tracer 
is an estimator of the gravitational force GM(< R c )/Rf 
or circular velocity around the characteristic radius of the 
tracer population. On the other hand, this estimator barely 
constrains the gravity elsewhere, leaving the shape of the 
mass profile unconstrained. Parameters leading to the same 
M(< R c ) but different shapes in the mass profile are thus 
indistinguishable by the mean phase estimator, resulting in 
the parameter degeneracy in Fig. [I] The radial likelihood 
estimator breaks this degeneracy by its ability to also con¬ 
strain the shape of the profile, as illustrated in the right 
panel of Fig. [4] 

The positive correlation between the mean phase 0 and 
the characteristic mass, M(< R c ), can be understood qual¬ 
itatively. For profiles with the same shape (on a logscale), a 
higher M(< R c ) leads to deeper potential everywhere. For a 
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Figure 3. The la confidence region for the different estimators. In both panels, the dark and light shaded regions are the confidence 
regions from the AD and mean-phase estimators respectively, while the green line marks the confidence region from the radial likelihood 
estimator (RBin), and the black line marks that from the f(E , L) estimator. In the left panel, the confidence regions are inferred from 
a single sample of 1000 particles. The blue and black points correspond to the best-fit parameters of the radial likelihood and f(E , L) 
estimators. In the right panel, the confidence regions represent the 68.3% most probable region of the best-fitting parameters, according to 
the distribution of best-fitting parameters from 750 independent samples. The blue and black points correspond to the average parameters 
from the radial likelihood and f(E,L) estimators respectively. Note that since the AD and mean phase fits are sensitive to the initial 
guess of the parameter values due to their strong degeneracy, we have randomly picked the initial values when fitting each sample. The 
f(E , L ) and RBin fits are independent of the initial parameters. The samples used here are generated according to the f(E , L ) DF with 
true parameters (Mq,co); hence the f(E , L ) fit is, by construction, the best constraint one can obtain. 


V 



R /kpc 



Figure 4. Constraints on the mass profile from different methods. In the left panel, we plot the predicted mass profiles adopting 
parameters lying on the © = 0 and ±3 lines in the parameter space in Fig. [T](© = 3, 0, —3 from top to bottom near the median radius). 
The vertical line marks the median radius (i.e., the half mass radius) of the tracer. In the right panel, the © = 0 profiles are compared, 
over a wider radial range, with those adopting parameters on the la contour of the binned radial likelihood estimator. The span of the 
latter, that is, the la prediction bounds of the likelihood estimator, are marked by the shaded region. Note that since the constant © 
lines are not closed in the parameter space (e.g., Fig. [TJ, there could be many mass profiles with shapes far more different from those 
plotted in this figure that still share the same ©. In other words, the shape variation at the same © could be much larger than plotted. 
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Figure 5. Mass profiles adopting parameters on constant © lines, demonstrating the characteristic radius of the tracers. These are the 
same as the left panel of Fig. [4] but using two tracer samples with different radial cuts. The left panel uses a sample from 1 — 30 kpc, 
while the right panel uses a sample from 30 — 1000 kpc. The vertical black lines mark the half mass (i.e., median) radius of each sample. 
The sample sizes are both 1000 particles and are selected by applying only radial cuts to the parent samples constructed in Section [3] 




Figure 6. The mass profile constrained with the phase-mark method. The left and right panels are the same except that they have 
different number of radial bins. The bins are defined to have equal numbers of tracer particles, by subdividing a parent sample of 1000 
particles. In each upper panel, the vertical dotted line marks the median radius of the full sample of tracers; the black solid line shows 
the true mass profile of the halo; the green dashed line is the best-fit profile using the radial likelihood method; the blue solid lines 
are the best-fit point-mass and isothermal profiles in each radial range. The point where the two blue lines cross (marked by points 
with error-bars) gives the characteristic mass in each bin. The errorbars are the uncertainty in the fitted point-mass parameter. The red 
dashed line is the best-fit NFW profile to the characteristic mass points. The bottom panels are the corresponding mass profiles divided 
by the true profile. The shaded region shows the 1 -a uncertainty on the fitted profiles from the radial likelihood (green) and from fitting 
NFW profiles to the characteristic masses (red). 
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Figure 7. Same as Fig. [3] but also showing the constraints from fitting NFW profiles to the phase-marks. In the left panel, the red 
solid and red dashed contours show the 1-tr confidence regions obtained by dividing a single sample of 1000 particles into two and five 
radial bins respectively. The red filled and open squares in the centre of each contour show the corresponding best-fit parameters. In the 
right panel, the contours show the 68.3% most probable region of the best-fitting parameters, according to their distribution obtained 
from many (750) independent samples of 1000 particles each. The points in the centre show the median best-fitting parameters. Again 
the result for the phase-mark with two and five bins are shown in red solid and dashed contours respectively, while the filled and open 
squares show the corresponding median parameters. 


particle with a given position and velocity, a deeper poten¬ 
tial of the same shape will shift both its peri and apocentres 
closer to the centre of the halo. As a result, the current 
location of the particle relative to its peri and apocentres 
is shifted outward, increasing its phase angle 9. The mean 
phase of all particles thus increases with the characteristic 
mass. 

The location of the characteristic radius determines 
the shape of the mean phase line in parameter space. To 
demonstrate this, instead of working in the (M, c) param¬ 
eter space, it is more convenient to work in the (M s ,r s ) 
space, where M s = 4n p s r^. Suppose the constant 0 lines 
are described by M s = f@(r a ), then we have M(< r) = 
/e(r s )[ln(l + r/r s ) — (r/r s )/(l + r /r s )]. Now the character¬ 
istic radius r at which the mass does not vary with the halo 
parameter r s is given by 

dM{< r,r s ) =Q; (33) 

OT s 

whose solution r = R c depends on the contour line function 
/e(r a ). So the functional form of the contour line determines 
R c , and vice versa. 

It is worth clarifying that the characteristic radius, 
hence also the shape of the degeneracy curve, is determined 
by the distribution of the tracer and is not an intrinsic prop¬ 
erty of the halo. We demonstrate this in Fig. 0 where the 
characteristic radii of two ideal tracers with different radial 
ranges are shown. The two tracers are constructed by sam¬ 
pling from 1 — 30 and 30 — 1000 kpc respectively from a 
parent population whose half mass radius is roughly 30 kpc. 
Clearly, the two samples have different characteristic radii, 
which are both close to their own half-mass radii, while the 
haloes hosting the two samples are identical. Correspond¬ 


ingly, we have checked that the mean phase contours have 
different slopes in log(M) — log(c) space from that shown in 

Fig- ED 

The existence of a best-constrained mass despite the pa¬ 
rameter degeneracy is broadly in line with empirical results 
on the robustness of the mass constraint inside the half- 
ligh t radius from Jeans equation modelli ng of dwarf galax¬ 
ies (IWalker et al.ll2009l ; IWolf el ~aT1 1201 fll ) . The existence of 
such a ch aracteristic point in the Jeans analysis is further 
proved bv iWolf et al.l (|201Cll ). However, their results are con¬ 
cerned with the insensitivity of the mass estimate to the 
velocity anisotropy parameter of the tracers, which has to 
be fitted or assumed because only line-of-sight velocities are 
available in these studies. On the other hand, our parame¬ 
ter degeneracy arises from solving the mean-phase equation 
using the full 6D data. Anisotropy is not a parameter in our 
model at all. 

Another closely rela ted result to ours is presented by 
lAmorisco fe Evansi (1201 ll . AE11 hereafter), who studied the 
underlying potential of dwarf spheroidals assuming a low¬ 
ered isothermal DF. They found that the structural param¬ 
eters ( p s ,r s ) (or (po, ro) in the original notation of AE11) of 
NFW-like haloes are constrained by the observed (i?h,cro) 
of each dwarf spheroidal to follow a certain relation, p s (r s ), 
where Rh is the projected half-light radius and ao is the 
line-of-sight velocity dispersion at the centre of the dwarf 
spheroidal. This resulted in a best-constrained mass near 
a common characteristic radius R c = 1.7 R^ for almost all 
the dwarf spheroidals. As we explained above (see Equa¬ 
tion 13311 . the existence of this characteristic point can be 
understood because the two parameter halo profile is re¬ 
duced to a one parameter family due to the constraint of 
the problem. In AE11, the constraint comes from matching 
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the observed (i?h,oo) of each system. By contrast, in our 
case a constrained relation, p s (r s ), or equivalently, M(c), 
is determined by solving 0 = 0 (see Fig. [T|). Note that it 
is not expected that an arbitrary constraint (for example, 
r s (ps) = Const) would always result in a best-constrained 
mass in the mass profile, so the similarity between the results 
by AE11 and ours is intriguing. Despite the apparent simi¬ 
larity, our result is purely theoretical, while that of AE11 is 
empirically driven by the observed quantities, (Rh,cro). Our 
finding is expected to apply to steady state tracers in gen¬ 
eral, not only to those described by the lowered isothermal 
DF or to the observed dwarf spheroidals studied in AE11. 
In our general case, the characteristic radius is not a con¬ 
stant value in units of the median radius, as is evident in 
Fig. [S] We checked that the same is true (i.e., the scale is 
not universal) in units of the projected half-mass radius. By 
contrast, the common characteristic radius in AE11 is likely 
to arise from some common properties shared by the dwarf 
spheroidals studied there, namely the tight correlation be¬ 
tween (Rh, fr 0 ). 

5.2 The phase-mark method 

The experiment in Fig. [5]also suggests a way of breaking the 
degeneracy in the mean-phase estimator, by applying it to 
two or more subsamples split in radius. In this way the shape 
of the mass profile can be constrained as well, since different 
subsamples constrain the characteristic mass, M c , at differ¬ 
ent radius, R c . For parametric fits, the degeneracy lines in 
Fig. Q] would have different slopes for subsamples with dif¬ 
ferent R c , so they could jointly determine a unique best-fit 
parameter set. Even better than that, it is possible to re¬ 
construct the mass profile non-parametrically, thanks to the 
insensitivity of the mean-phase constraint on the shape of 
the mass profile. The fact that the mean phase only depends 
on the characteristic mass point means one can start from 
a profile of an arbitrary shape and still obtain the correct 
characteristic mass, by requiring the profile to produce the 
correct mean phase. As a result, it is not necessary to know 
the functional form of the true profile in order to constrain 
( R C ,M C ). By applying the mean-phase constraint 0 = 0 
twice to two different single-parameter profiles and looking 
for the point where they intersect, one can simultaneously 
obtain both the characteristic radius and the characteristic 
mass. Note that only having M c is not enough, since R c 
is unknown even though it is close to the tracer half-mass 
radius. 

We demonstrate this in Fig. [t[] In the left panel, we 
divide the tracer sample of 1000 particles studied in Fig. 0 
into two sub-populations according to radius. Inside each 
bin, we fit two mass profiles with the mean-phase estimator: 
1) point mass profile, M(< R) = M c with parameter M c ; 2) 
isothermal profile, M(< R) = kR, with parameter k. The 
mean phase constraint 0 = 0 uniquely determines a best fit 
for each profile. As expected, they cross the true profile at 
the same point, which marks the characteristic point of that 
bin as ( R c = M c /k, M(< R c ) = M c ). We name this method 
the “phase-mark”. Splitting the tracer into more bins, we can 
obtain finer constraints on the profile, as shown in the right 
hand panel. By doing this, we have reconstructed the true 
mass profile non-parametrically, without any assumption on 
the true profile. Such reconstructed mass profile becomes 


noisier when a larger number of bins is adopted (right panel), 
since each subsample becomes smaller. 

If desired, one can still fit a parametric function through 
the reconstructed profile, as shown by the red dashed line 
in each panel, with confidence regions on the fitted profile 
marked by the red shaded regions in the lower panels. After 
combining all the radial bins, the tightest constraint is still 
found near the half-mass radius of the full tracer sample. It 
is interesting to see that although a larger number of bins 
helps to obtain finer reconstruction of the profile, it does not 
lead to a better constrained profile after fitting. It appears 
that the constraining power using only two bins is close to 
that of the radial likelihood method. This is confirmed by 
the more direct comparisons made in Fig. [7] The confidence 
region of the phase-mark with two bins has a comparable size 
to that of the likelihood method. This means they are sim¬ 
ilarly efficient at making use of the dynamical information. 
Adopting finer bins in the phase-mark results in a looser 
constraint. This can be understood because each mark only 
exploits the local phase uniformity inside each radial bin, 
while the large scale variation from bin to bin is not taken 
into account, resulting in a leakage of information. A po¬ 
tential improvement would be to combine bins at different 
scales. However, it should be kept in mind that doing this 
will introduce correlations among the marks, making the er¬ 
ror analysis difficult. In the right panel, the phase-mark with 
two bins is applied to many independent Monte-Carlo real¬ 
izations of the same system as before, to show that the fit 
is statistically unbiased. 


6 DISCUSSION 

6.1 What is a tracer population? 

Dynamical modelling require the tracer sample to be defined 
first, or subsamples to be selected from a parent sample. 
Here we revisit the question of “what is a tracer population?”. 
Modelling the tracer population with a time-independent 
DF requires the tracers to be in a steady state. For a spheri¬ 
cally symmetric potential, this requirement translates into a 
conditional radial distribution dN/dr oc 1/| v r (E, L, r)| (nec¬ 
essary and sufficient) given E and L, or equivalently, all the 
particles have completely uncorrelated radial phases. Once 
this condition is satisfied, one can use the distribution of the 
sample to infer the potential of the system. As a result, we 
can simply define a tracer as any set of steady-state particles 
moving in the background potential. 

To obtain a steady state subsample, the selection from 
the sample must not distort this conditional radial distribu¬ 
tion and must avoid introducing artificial structure in the 
radial or angle distribution. As long as this is guaranteed, 
any selection in E and L is allowed. For example, one can 
select subregions of the E — L space, while keeping full or 
random sampling in r. For a parent population not in equi¬ 
librium inside a static potential, a steady-state subsample 
can still be selected by sampling according to Eq ©• 

From this definition we also learn how to mix tracers 
with weights. If tracer i has a steady-state phase-space dis¬ 
tribution fi, then the uniformly weighted population Wifi 
is still a tracer. Consequently, N mixed tracers JA =1 Wifi 
is still a tracer, since equation © is satisfied for every sub- 
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Bins 


Figure 8. Mean phase profile evaluated with the two degenerat¬ 
ing parameter sets. From top to bottom we bin the same sample of 
1000 tracer particles according to their r, E, L coordinates respec¬ 
tively, with equal number of particles in each bin. The mean phase 
deviation, ©, is evaluated inside each bin. Different coloured lines 
represent different haloes. The blue solid and red dashed lines 
are the profiles adopting two different potentials: one with the 
real parameter values (M, c) and the other with a parameter set 
(0.6 M, 2.2c) whose mean phase is degenerate with that of the true 
parameters. 

tracer. As a result, when dealing with multi-component trac¬ 
ers using the oPDF, we can either model them as a single 
population, or as several populations separately. 

Obviously, subhalo particles are not steady-state trac¬ 
ers since they are localized structures. Streams in general 
are also not steady state tracers since they are usually char¬ 
acterized by correlated phases. 

6.2 The optimal marginalization 

It is tempting to ask why the radial likelihood estimator 
works better than the phase estimators. Recall that the 
oPDF actually specifies both the randomness of the phase 
angle and the independence of such a distribution on the or¬ 
bital parameters. That is, the phase distribution is not only 
uniform in general, but also uniform inside any (E, L) bin. 
The oPDF also applies to any radial range, so the uniformity 
is expected for any region in the (r, E, L ) space. In Fig. [8]we 
examine the mean phase in subregions of the phase space. 
We divide the data into 30 equal-count bins in each dimen¬ 
sion, and measure the mean phase inside each bin for a given 
potential. This mean phase value indicates the discrepancy 
of the data inside this subregion with uniform phase distri¬ 
bution. We do this for two mean phase degenerate points. 
For the true potential, 0 is consistent with 0 everywhere. 
For the degenerate parameter set, we start to see a depen¬ 
dence on ( E,L ), and 0 is biased positively or negatively at 


different places in the ( E , L) space, even though the com¬ 
bined 0 or the 0 in r space would still be close to 0. This 
test shows that there is still useful information beyond the 
uniform 8 distribution, namely its (E, L) dependence. 

Since the minimum distance estimators do not examine 
the (E, L) dependence, they have effectively marginalized 
over the (E, L) distribution of tracers. One can rewrite the 
definition of the phase angle as 

6(r,E,L) = P(<r\E,L). (34) 

From this point of view, the marginalization is done by work¬ 
ing in the cumulative probability space of the tracers. Al¬ 
though the radial likelihood method also marginalizes over 
the (E, L) distribution, the marginalization combines the 
oPDF from different (E, L) orbits in a different (possibly 
optimal) way. This could result in a marginalized DF that 
is more sensitive to the discrepancies in the conditional dis¬ 
tribution. In Fig. U3 we see that the radial distribution is 
indeed more sensitive, by examining the difference between 
empirical and expected cumulative distributions in 0 and r 
space. 

6.3 Connection to other methods 

Since d P(r,E,L) = dP(r\E,L)dP(E,L), the full phase- 
space distribution of tracers breaks into two parts. The or¬ 
bital PDF is determined by dynamics, and reflects the un¬ 
derlying potential. The P(E,L) part is simply a character¬ 
istic of the tracer not necessarily related to dynamics, and a 
sample with any form of P(E, L) can be constructed which is 
still a valid tracer population. Any DF method has to make 
use of the P(r\E,L) information in some way, but how one 
deals with P(E, L) is not crucial to the determination of the 
potential. 

6.3.1 Comparison with the f(E,L) DF method 

As we have already discussed in Section 12.41 any f(E, L) DF 
function has to be consistent with the oPDF, while impos¬ 
ing extra assumptions on the distribution of orbits. Because 
our method only uses the conditional radial distribution, it 
is fully compatible with the density profile inverted f(E, L) 
method. At the same time, our method has no extra as¬ 
sumptions and is applicable to more general tracers. Also 
note that the density profile inversion involving a particu¬ 
lar parametrization of the potential can s ometimes be quite 
challenging (see, e.g., IWang et alj 120150 . and may not al¬ 
ways be solvable analytically. In contrast, the application 
of any potential function in our oPDF method is always 
straightforward. 

Since the oPDF is given in differential form, it does 
not care about the radial limits of the system. One can ap¬ 
ply the orbital PDF to data within any radial range, e.g., 
from r m i n to r max , since the phase-space continuity equa¬ 
tion holds within any radial range. When radial cuts are 
imposed, we only need to replace the orbital limits r a with 
rnax(r a ,r m i n ) and r p with min(r p , r max ). In this case, the 
data only care about the variation of the potential within 
the same radial range. For the same reason, the zero point 
of the potential, the extension of the halo or tracer den¬ 
sity profile outside the data window, or the boundary of the 
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Figure 9. Cumulative distributions in 8 and r spaces. Left: Cumulative distribution of 8, at two sets of parameter values. The red 
solid line corresponds to the true parameters, while the green solid line corresponds to (0.6M, 2.2c) which give the same mean phase. 
The bottom panel is the difference between the model distribution and the cumulative distribution of a uniform distribution. Right: 
cumulative distribution of r. The black dashed line is the distribution of the data, while the red and green lines are for the true and 
alternatives parameters as in the left panel. 


halo is irrelevant in our method. By contrast, in the f(E, L ) 
method, the DF, f(E, L), has to satisfy the radial constraint 
p(r) = f f(E, L)d 3 v at all r by definition. Any change in 
p(r) at any radius will require an adjustment in the pro¬ 
posed DF. As a result, one has to include the full radial 
range of each orbit in its density profile inversion. This re¬ 
quires a full description of both the potential and the tracer 
density over the full radial range, introducing a dependence 
on quantities outside the data window. Such dependence, 
in turn, requires one to parametrize the tracer and the po¬ 
tential profiles for extrapolation. In particular, for a finite 
size system the boundary condition, p(r max ) = 0, requires 
that no orbit should extend beyond r max , which translates 
into an energy bound, E < —c/>(r max ). In other words, the 
energy of particles has to be bound for a finite size system 
described by f(E,L). This constraint is the main cause of 
the poor mat ch between model and data for simulated DM 
particles in (IWang et al.ll2015l ). This constraint does not 
apply to our method, however, because we do not need to 
study the full radial range of every orbit. For example, our 
method can be applied to an open system with constant in¬ 
flows and outflows! In this sense, the oPDF is more general 
than Jeans theorem. 

Fitting with a full DF also has its advantage over the 
general oPDF method. The prior assumptions on the dis¬ 
tribution of orbits serve to input extra information to the 
model. If these assumptions are correct, the fitting can be 
more efficient, as demonstrated by the performance of the 
true f(E, L) DF fitted to the ideal tracers. On the other 
hand, incorrect assumptions are likely to lead to biased re¬ 
sults in the fits. From this point of view, adding extra as¬ 
sumptions in the construction of a DF is a trade-off between 


efficiency and correctness. Our oPDF method is specifically 
designed to minimize extra assumptions hence maximizing 
correctness. 

6.3.2 Connection to Schwarzschild’s method 

Our radial likelihood method can be regarded as a 
lightweight Schwarzschild’s method. Starting from the 
oPDF, P(r\E, L), one can populate different orbits with 
tracer particles given a potential, and look for weighted com¬ 
binations of orbits that reproduce the observed spatial dis¬ 
tribution of the tracer. The best match then gives an esti¬ 
mate of both the potential and a phase-space distribution 
in the form of combination s of orbits. This is the exactly 
the Schwarzschild method (ISchwarzschildlIl979l) . which es¬ 
sentially converts the p = f fd 3 v into linear equations in 
phase-space grids p(I) = f2jC{J)P{I\J), where I denotes 
configuration grids and J denotes orbit populations. How¬ 
ever, to infer the potential, it is not necessary to solve for 
a general combination of orbits, C(J). Instead, one can ob¬ 
tain the distribution of orbits directly from the observed 
phase-space positions of particles, with each particle deter¬ 
mining one orbit, i.e, C(J) = 1 with J ranging from 1 to 
the number of particles. This is exactly what we do in our 
likelihood method. In this sense, our likelihood method is a 
special type of Schwarzschild’s method, with the population 
of orbits constrained to be the distribution of orbits in the 
data, rather than constructed from an external library. We 
do not lose generality with our choice of orbits, while hugely 
reducing the dimension of the problem by not solving for 
C(J) at all. 

The disadvantage of not fitting for the distribution of 
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orbits in our method is that we have to rely on the full 
phase-space data to initialize each orbit. When only certain 
moments of the DF such as the velocity dispersion profile 
are available, Schwarzschild’s method can still be applied by 
fitting the predicted moments of the proposed DF to the ob¬ 
served ones, while the radial likelihood method constructed 
here cannot. Another advantage of the general Schwarzschild 
modelling is that it can be adopted to construct a self- 
gravitating equilibrium system, which can be used as initial 
conditions for N-body experiments. 


6.4 Generalization of the likelihood method 

Observational data usually involve selection functions de¬ 
scribing the non-uniform sample completeness, noise in the 
measurements of the phase-space coordinates, and even 
missing dimensions in the coordinates. We briefly discuss 
how these complexities can be handled in the oPDF frame¬ 
work, as well as generalizations to non-spherical potentials. 
Given that the focus of this paper is to explore whether a 
general dynamical method is applicable to simulated haloes 
for the inference of the halo potential, we do not push the 
following discussions further to implementation. Instead, we 
leave further tests and improvements of the proposed solu¬ 
tions to future work. In the following, we will focus on the 
likelihood estimator as an example. 


6-4-1 Selection function, noise, missing dimensions 


As we have discussed before, our methods apply to tracers 
with any (E, L) distribution, and hence is immune to any 
selection in ( E,L ). Radial selection can be easily handled 
by modifying the reciprocal probability as 


pi _ PjjSj/ Sj 
ij ~ I PijSi/Sj dr i’ 


(35) 


where Si = S(ri) is the probability of selecting a particle 
into the sample at . If the selection is simply a radial cut, 
then equation (1 35 II simplifies to an adjustment of the nor¬ 
malization factor T. When angular selections are involved, 
one needs to explicitly consider L instead of L as the orbital 
parameters, to model the distribution in r, 9, <j> rather than 
just r. 

The noise in the data can be incorporated as priors. 
Formally, the new likelihood after marginalizing over the 
error distribution can be written as 


C'^Do) = j C^(D)P(D\D 0 )dD, (36) 


where C-^{D) is the likelihood of an error-free dataset, D, 
in a potential ip, P(D\D 0 ) is the probability that the true 
dataset is D given the observed dataset D 0 . Alternatively, 
one can generate Monte-Carlo realizations of the data ac¬ 
cording to the prior distributions, P(D\D 0 ), and apply the 
method to each realization assuming no noise in the mea¬ 
surements. Once this is done, a statistical estimate of the 
effect of the measurement noise on the fits can be obtained 
from the distribution of the best-fitting parameters across 
the different realizations. 

Observational data might also miss some dimensions. 
For example, it is difficult to measure the tangential velocity 
for distant stars in the Galaxy. If only v r is available, then 


it is necessary to introduce additional assumptions on vt 
in order to apply the method, e.g., through an anisotropy 
parameter or anisotropy profile, /3(r). 


6-4-2 Generalization to arbitrary potentials 

For a non-spherical potential, it might be difficult to write 
down the integrals of motion as orbital parameters. How¬ 
ever, the orbit is still fully determined for each particle once 
a potential is assumed, and one can calculate the orbit nu¬ 
merically without knowing the integrals of motion. With 
calculable orbits, we can still predict the spatial distribu¬ 
tion of particles by superimposing the oPDF of each particle, 
and compare with the observed distributions for a likelihood 
analysis of the potential. 


7 CONCLUSIONS 

We have shown that tracers in a steady state in a static 
potential can be characterized by an orbit-dependent distri¬ 
bution function, dP(A|orbit) oc df(A), with A being an affine 
parameter of the position along the orbit. This is a general 
result that follows from the time-independent collisionless 
Boltzmann equation. We clarify that the phase-space dis¬ 
tribution of tracers connects to their host potential only 
through this oPDF, while the distribution of orbits, e.g., 
P(E,L), is a characteristic of each tracer that is indepen¬ 
dent of the host potential. The oPDF can also be shown to 
be equivalent to Jeans theorem, which is the starting point 
for constructing DFs for steady-state tracers in most previ¬ 
ous studies. 

Starting solely from this oPDF, we have developed a 
likelihood estimator to infer the potential of a spherically 
symmetric halo. The method improves over previous f(E, L) 
DF methods in making no assumption about the tracer 
characteristic functions, P(E,L). We achieve this by ap¬ 
proximating the prior distribution of orbits, P(E,L), by 
their empirical distribution once a halo potential is assumed, 
and marginalize over this distribution. The approximation 
of P(E, L) by its empirical version introduces strong shot 
noise, which is suppressed by binning the data radially. 

To test the performance of the likelihood estimator we 
have created Monte-Carlo samples of steady-state tracers 
from a realistic phase-space DF for Milky Way halo stars. 
The DF of these samples is constructed to be in a steady 
state but also makes additional assumptions about the orbit 
population. Applying our estimator to these samples, we 
find it to be unbiased. Comparing our estimates with those 
from a likelihood estimator that uses the correct form of 
the underlying full DF, our estimated errorbars are only 
increased slightly (~ 20%), while avoiding having to assume 
any functional form for the DF. Such a likelihood estimator 
can be easily embedded into a Bayesian framework. 

Expressed in action-angle coordinates, the oPDF re¬ 
duces to the random phase principle proposed in BL04, 
which has been used to construct minimum distance esti¬ 
mators of the potential. When applied to the inference of 
an NFW potential, the minimum distance estimators suf¬ 
fer from a strong degeneracy in halo parameters, reflecting 
the fact that they only constrain the halo mass inside a 
tracer-specific radius. While this degeneracy is an obvious 
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disadvantage of these methods, it also opens a door to non- 
parametric reconstruction of the potential profile (or rota¬ 
tion curve) due to its independence on the shape of the pro¬ 
posed profile. Taking advantage of this shape-independence, 
we have developed a non-parametric “phase-mark” method 
to reconstruct the potential profile, by fitting elementary 
profiles to radially split subsamples of the tracer to mark 
the characteristic mass in each radial bin. Applied to the 
Monte-Carlo samples, we have shown that the phase-mark 
correctly reconstructs the true potential without making any 
assumptions about its shape. Such reconstructed profiles can 
be further fitted to provide parametric constraints on the po¬ 
tential. We find that the constraining power of such fits can 
be as good as that of the likelihood method and the tight¬ 
est constraint is obtained with only two radial subsamples. 
The phase-mark method is more intuitive for recovering the 
potential profile due to its non-parametric nature. It is also 
computationally much faster. 

Both the likelihood estimator and the phase-mark are 
able to break the degeneracy between mass and concentra¬ 
tion and constrain the shape of the halo mass profile over a 
large radial range. The generality of the oPDF also means 
that our methods can be applied to tracers with multiple 
components, but without the necessity of modelling each 
component separately. In the current form, the new meth¬ 
ods developed in this paper can serve as a powerful tool to 
study the dynamical status of simulated haloes. They also 
offer a promising way to constrain the mass of the Milky 
Way halo with real data, once further extended and tested 
to work with observational errors and incompleteness. 
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APPENDIX A: AD DISTRIBUTION 

The theoretical distribution of the AD distance (Eq. 1221) un¬ 
der the null hypothesis can be calculated with Monte-Carlo 
simulations. Specifically, we generate a number of indepen¬ 
dent random samples, and calculate the AD distance, D, 
for each of them. Each sample consists of N independent 
observations of a uniformly distributed variable 6 £ [0, 1]. 
In Fig. IA1I we show that the distribution of ln(D) (Eq. 1221) 
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Figure Al. Binormal fit to the distribution of the AD statis¬ 
tic. The data are generated from an ensemble of 50 000 random 
samples, of size N = 5000 each. For each of these samples, the 
AD statistic, D, is calculated. The empirical distribution of ln(D) 
is plotted as a histogram. The solid line passing through the his¬ 
togram is a bi-normal fit according to Equation dAll) , which is the 
sum of two Gaussian components (dashed lines). For comparison, 
we also plot the BL04 fit which is only designed to describe the 
tail of the distribution. 


can be well fit by the sum of two normal distributions of the 
form 

P (InD) = wJ\T (In D, hi , ax) + (1 — w)J\f (In D, [i 2 , <t 2 ), 

(Al) 

where AT (x, ft, a) is the standard normal probability func¬ 
tion of x with mean /r and standard deviation a. The best 
fit parameters are w = 0.569, /ri = —0.570, <ti = 0.511, 
fi 2 = 0.227, <J 2 = 0.569. Compared with the fitting function 
in BL04 designed to fit the tail of the distribution, the bi¬ 
normal PDF fits well the whole range of the distribution, 
which is important for likelihood analysis. 

The distribution has barely any dependence on the sam¬ 
ple size N. For systems as small as N = 5, we find our fitting 
still describes the empirical AD distribution very well. We 
also verified that the mean phase distribution can be well 
approximated by the normal distribution, for systems with 
N > 5. 







